{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "csv.ipynb",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "DweYe9FcbMK_"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "colab_type": "code",
        "id": "AVV2e0XKbJeX",
        "colab": {}
      },
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "sUtoed20cRJJ"
      },
      "source": [
        "# Muat data CSV"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "1ap_W4aQcgNT"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/load_data/csv\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />Lihat di TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/id/tutorials/load_data/csv.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Jalankan di Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/id/tutorials/load_data/csv.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />Lihat kode di GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/id/tutorials/load_data/csv.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Unduh notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yPWYwGI7pIkH",
        "colab_type": "text"
      },
      "source": [
        "Note: Komunitas TensorFlow kami telah menerjemahkan dokumen-dokumen ini. Tidak ada jaminan bahwa translasi ini akurat, dan translasi terbaru dari [Official Dokumentasi - Bahasa Inggris](https://www.tensorflow.org/?hl=en) karena komunitas translasi ini adalah usaha terbaik dari komunitas translasi.\n",
        "Jika Anda memiliki saran untuk meningkatkan terjemahan ini, silakan kirim pull request ke [tensorflow/docs](https://github.com/tensorflow/docs) repositori GitHub.\n",
        "Untuk menjadi sukarelawan untuk menulis atau memeriksa terjemahan komunitas, hubungi\n",
        "[daftar docs@tensorflow.org](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "C-3Xbt0FfGfs"
      },
      "source": [
        "Tutorial ini memberikan contoh cara memuat data CSV dari file ke `tf.data.Dataset`.\n",
        "\n",
        "Data yang digunakan dalam tutorial ini diambil dari daftar penumpang Titanic. Model akan memprediksi kemungkinan penumpang selamat berdasarkan karakteristik seperti usia, jenis kelamin, kelas tiket, dan apakah orang tersebut bepergian sendirian."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "fgZ9gjmPfSnK"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "I4dwMQVQMQWD",
        "colab": {}
      },
      "source": [
        "try:\n",
        "  # %tensorflow_version hanya ada di Colab.\n",
        "  %tensorflow_version 2.x\n",
        "except Exception:\n",
        "  pass"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "baYFZMW_bJHh",
        "colab": {}
      },
      "source": [
        "from __future__ import absolute_import, division, print_function, unicode_literals\n",
        "import functools\n",
        "\n",
        "import numpy as np\n",
        "import tensorflow as tf"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Ncf5t6tgL5ZI",
        "colab": {}
      },
      "source": [
        "TRAIN_DATA_URL = \"https://storage.googleapis.com/tf-datasets/titanic/train.csv\"\n",
        "TEST_DATA_URL = \"https://storage.googleapis.com/tf-datasets/titanic/eval.csv\"\n",
        "\n",
        "train_file_path = tf.keras.utils.get_file(\"train.csv\", TRAIN_DATA_URL)\n",
        "test_file_path = tf.keras.utils.get_file(\"eval.csv\", TEST_DATA_URL)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "4ONE94qulk6S",
        "colab": {}
      },
      "source": [
        "# Membuat nilai numpy agar lebih mudah dibaca.\n",
        "np.set_printoptions(precision=3, suppress=True)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Wuqj601Qw0Ml"
      },
      "source": [
        "## Memuat data\n",
        "\n",
        "Untuk memulai, mari kita lihat bagian atas file CSV untuk melihat formatnya.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "54Dv7mCrf9Yw",
        "colab": {}
      },
      "source": [
        "!head {train_file_path}"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "jC9lRhV-q_R3"
      },
      "source": [
        "Anda dapat [memuat ini menggunakan panda] (pandas.ipynb), dan meneruskan array NumPy ke TensorFlow. Jika Anda perlu meningkatkan ke file dengan skala besar, atau membutuhkan loader yang terintegrasi dengan [TensorFlow dan tf.data] (../../panduan/data.ipynb) kemudian gunakan fungsi `tf.data.experimental.make_csv_dataset`:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "67mfwr4v-mN_"
      },
      "source": [
        "Satu-satunya kolom yang perlu Anda identifikasi secara eksplisit adalah kolom dengan nilai yang dimaksudkan untuk diprediksi oleh model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "iXROZm5f3V4E",
        "colab": {}
      },
      "source": [
        "LABEL_COLUMN = 'survived'\n",
        "LABELS = [0, 1]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "t4N-plO4tDXd"
      },
      "source": [
        "Sekarang baca data CSV dari file dan buat sebuah dataset.\n",
        "\n",
        "(Untuk dokumentasi lengkap, lihat `tf.data.experimental.make_csv_dataset`)\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "yIbUscB9sqha",
        "colab": {}
      },
      "source": [
        "def get_dataset(file_path, **kwargs):\n",
        "  dataset = tf.data.experimental.make_csv_dataset(\n",
        "      file_path,\n",
        "      batch_size=5, # Artifisial kecil untuk membuat contoh lebih mudah ditampilkan.\n",
        "      label_name=LABEL_COLUMN,\n",
        "      na_value=\"?\",\n",
        "      num_epochs=1,\n",
        "      ignore_errors=True, \n",
        "      **kwargs)\n",
        "  return dataset\n",
        "\n",
        "raw_train_data = get_dataset(train_file_path)\n",
        "raw_test_data = get_dataset(test_file_path)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "v4oMO9MIxgTG",
        "colab": {}
      },
      "source": [
        "def show_batch(dataset):\n",
        "  for batch, label in dataset.take(1):\n",
        "    for key, value in batch.items():\n",
        "      print(\"{:20s}: {}\".format(key,value.numpy()))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "vHUQFKoQI6G7"
      },
      "source": [
        "Setiap item dalam dataset adalah *batch*, direpresentasikan sebagai *tuple* dari (*banyak contoh*, *banyak label*). Data dari contoh-contoh tersebut disusun dalam tensor berbasis kolom (bukan tensor berbasis baris), masing-masing dengan elemen sebanyak ukuran batch (dalam kasus ini 5).\n",
        "\n",
        "Dengan melihat sendiri, mungkin akan membantu Anda untuk memahami."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "HjrkJROoxoll",
        "colab": {}
      },
      "source": [
        "show_batch(raw_train_data)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "YOYKQKmMj3D6"
      },
      "source": [
        "Seperti yang Anda lihat, kolom dalam file CSV diberi nama. Konstruktor dataset akan mengambil nama-nama ini secara otomatis. Jika file yang sedang Anda kerjakan tidak mengandung nama kolom di baris pertama, berikan mereka dalam daftar string ke argumen `column_names` dalam fungsi` make_csv_dataset`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "2Av8_9L3tUg1",
        "colab": {}
      },
      "source": [
        "CSV_COLUMNS = ['survived', 'sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']\n",
        "\n",
        "temp_dataset = get_dataset(train_file_path, column_names=CSV_COLUMNS)\n",
        "\n",
        "show_batch(temp_dataset)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "gZfhoX7bR9u4"
      },
      "source": [
        "Contoh ini akan menggunakan semua kolom yang tersedia. Jika Anda perlu menghilangkan beberapa kolom dari dataset, buat daftar hanya kolom yang Anda rencanakan untuk digunakan, dan kirimkan ke argumen `select_columns` (opsional) dari konstruktor.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "S1TzSkUKwsNP",
        "colab": {}
      },
      "source": [
        "SELECT_COLUMNS = ['survived', 'age', 'n_siblings_spouses', 'class', 'deck', 'alone']\n",
        "\n",
        "temp_dataset = get_dataset(train_file_path, select_columns=SELECT_COLUMNS)\n",
        "\n",
        "show_batch(temp_dataset)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "9cryz31lxs3e"
      },
      "source": [
        "## Data pre-processing\n",
        "\n",
        "File CSV dapat berisi berbagai tipe data. Biasanya Anda ingin mengonversi dari berbagai tipe ke vektor dengan panjang tetap sebelum memasukkan data ke dalam model Anda.\n",
        "\n",
        "TensorFlow memiliki sistem bawaan untuk menjelaskan konversi input umum: `tf.feature_column`, lihat [tutorial ini](../keras/feature_columns) untuk detailnya.\n",
        "\n",
        "Anda dapat memproses data Anda menggunakan alat apa pun yang Anda suka (seperti [nltk](https://www.nltk.org/) atau [sklearn](https://scikit-learn.org/stable/)), dan kemudian memberikan output yang telah diproses ke TensorFlow.\n",
        "\n",
        "Keuntungan utama melakukan preprocessing di dalam model Anda adalah ketika Anda mengekspor model itu termasuk dengan proses preprocessing. Dengan cara ini Anda bisa mengirimkan data mentah langsung ke model Anda."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "9AsbaFmCeJtF"
      },
      "source": [
        "### Continuous data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Xl0Q0DcfA_rt"
      },
      "source": [
        "Jika data Anda sudah dalam format numerik yang sesuai, Anda bisa mengemas data ke dalam vektor sebelum meneruskannya ke model:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "4Yfji3J5BMxz",
        "colab": {}
      },
      "source": [
        "SELECT_COLUMNS = ['survived', 'age', 'n_siblings_spouses', 'parch', 'fare']\n",
        "DEFAULTS = [0, 0.0, 0.0, 0.0, 0.0]\n",
        "temp_dataset = get_dataset(train_file_path, \n",
        "                           select_columns=SELECT_COLUMNS,\n",
        "                           column_defaults = DEFAULTS)\n",
        "\n",
        "show_batch(temp_dataset)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "zEUhI8kZCfq8",
        "colab": {}
      },
      "source": [
        "example_batch, labels_batch = next(iter(temp_dataset)) "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IP45_2FbEKzn"
      },
      "source": [
        "Berikut adalah fungsi sederhana yang akan menyatukan semua\n",
        "\n",
        "---\n",
        "\n",
        "kolom:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "JQ0hNSL8CC3a",
        "colab": {}
      },
      "source": [
        "def pack(features, label):\n",
        "  return tf.stack(list(features.values()), axis=-1), label"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "75LA9DisEIoE"
      },
      "source": [
        "Terapkan ini ke setiap elemen dataset:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "VnP2Z2lwCTRl",
        "colab": {}
      },
      "source": [
        "packed_dataset = temp_dataset.map(pack)\n",
        "\n",
        "for features, labels in packed_dataset.take(1):\n",
        "  print(features.numpy())\n",
        "  print()\n",
        "  print(labels.numpy())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "1VBvmaFrFU6J"
      },
      "source": [
        "Jika Anda memiliki data dengan berbagai tipe, Anda mungkin ingin memisahkan data simple-numeric fields. Api `tf.feature_column` dapat menanganinya, tetapi hal ini menimbulkan beberapa overhead dan harus dihindari kecuali benar-benar diperlukan. Kembali ke kumpulan data campuran:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "ad-IQ_JPFQge",
        "colab": {}
      },
      "source": [
        "show_batch(raw_train_data)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "HSrYNKKcIdav",
        "colab": {}
      },
      "source": [
        "example_batch, labels_batch = next(iter(temp_dataset)) "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "p5VtThKfGPaQ"
      },
      "source": [
        "Jadi tentukan preprosesor yang lebih umum yang memilih daftar fitur numerik dan mengemasnya ke dalam satu kolom:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "5DRishYYGS-m",
        "colab": {}
      },
      "source": [
        "class PackNumericFeatures(object):\n",
        "  def __init__(self, names):\n",
        "    self.names = names\n",
        "\n",
        "  def __call__(self, features, labels):\n",
        "    numeric_freatures = [features.pop(name) for name in self.names]\n",
        "    numeric_features = [tf.cast(feat, tf.float32) for feat in numeric_freatures]\n",
        "    numeric_features = tf.stack(numeric_features, axis=-1)\n",
        "    features['numeric'] = numeric_features\n",
        "\n",
        "    return features, labels"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "1SeZka9AHfqD",
        "colab": {}
      },
      "source": [
        "NUMERIC_FEATURES = ['age','n_siblings_spouses','parch', 'fare']\n",
        "\n",
        "packed_train_data = raw_train_data.map(\n",
        "    PackNumericFeatures(NUMERIC_FEATURES))\n",
        "\n",
        "packed_test_data = raw_test_data.map(\n",
        "    PackNumericFeatures(NUMERIC_FEATURES))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "wFrw0YobIbUB",
        "colab": {}
      },
      "source": [
        "show_batch(packed_train_data)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "_EPUS8fPLUb1",
        "colab": {}
      },
      "source": [
        "example_batch, labels_batch = next(iter(packed_train_data)) "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "o2maE8d2ijsq"
      },
      "source": [
        "#### Normalisasi Data\n",
        "\n",
        "Data kontinu (continues data) harus selalu dinormalisasi."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "WKT1ASWpwH46",
        "colab": {}
      },
      "source": [
        "import pandas as pd\n",
        "desc = pd.read_csv(train_file_path)[NUMERIC_FEATURES].describe()\n",
        "desc"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "cHHstcKPsMXM",
        "colab": {}
      },
      "source": [
        "MEAN = np.array(desc.T['mean'])\n",
        "STD = np.array(desc.T['std'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "REKqO_xHPNx0",
        "colab": {}
      },
      "source": [
        "def normalize_numeric_data(data, mean, std):\n",
        "  # Center the data\n",
        "  return (data-mean)/std"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "VPsoMUgRCpUM"
      },
      "source": [
        "Sekarang buat kolom angka. API `tf.feature_columns.numeric_column` menerima argumen `normalizer_fn`, yang akan dijalankan pada setiap batch.\n",
        "\n",
        "Bundlekan `MEAN` dan `STD` ke normalizer fn menggunakan [`functools.partial`](https://docs.python.org/3/library/functools.html#functools.partial)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Bw0I35xRS57V",
        "colab": {}
      },
      "source": [
        "# Lihat apa yang baru saja Anda buat.\n",
        "normalizer = functools.partial(normalize_numeric_data, mean=MEAN, std=STD)\n",
        "\n",
        "numeric_column = tf.feature_column.numeric_column('numeric', normalizer_fn=normalizer, shape=[len(NUMERIC_FEATURES)])\n",
        "numeric_columns = [numeric_column]\n",
        "numeric_column"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HZxcHXc6LCa7"
      },
      "source": [
        "Saat Anda melatih model, sertakan kolom fitur ini untuk memilih dan memusatkan blok data numerik ini:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "b61NM76Ot_kb",
        "colab": {}
      },
      "source": [
        "example_batch['numeric']"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "j-r_4EAJAZoI",
        "colab": {}
      },
      "source": [
        "numeric_layer = tf.keras.layers.DenseFeatures(numeric_columns)\n",
        "numeric_layer(example_batch).numpy()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "M37oD2VcCO4R"
      },
      "source": [
        "Normalisasi berdasarkan rata-rata yang digunakan di sini mewajibkan kita untuk mengetahui rata-rata setiap kolom sebelumnya."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "tSyrkSQwYHKi"
      },
      "source": [
        "### Kategori data\n",
        "\n",
        "Beberapa kolom dalam data CSV adalah kolom kategorikal. Artinya, kontennya harus menjadi salah satu dari opsi yang ada.\n",
        "\n",
        "Gunakan API `tf.feature_column` untuk membuat koleksi dengan `tf.feature_column.indicator_column` untuk setiap kolom kategorikal."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "mWDniduKMw-C",
        "colab": {}
      },
      "source": [
        "CATEGORIES = {\n",
        "    'sex': ['male', 'female'],\n",
        "    'class' : ['First', 'Second', 'Third'],\n",
        "    'deck' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],\n",
        "    'embark_town' : ['Cherbourg', 'Southhampton', 'Queenstown'],\n",
        "    'alone' : ['y', 'n']\n",
        "}"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "kkxLdrsLwHPT",
        "colab": {}
      },
      "source": [
        "categorical_columns = []\n",
        "for feature, vocab in CATEGORIES.items():\n",
        "  cat_col = tf.feature_column.categorical_column_with_vocabulary_list(\n",
        "        key=feature, vocabulary_list=vocab)\n",
        "  categorical_columns.append(tf.feature_column.indicator_column(cat_col))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "H18CxpHY_Nma",
        "colab": {}
      },
      "source": [
        "# See what you just created.\n",
        "categorical_columns"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "p7mACuOsArUH",
        "colab": {}
      },
      "source": [
        "categorical_layer = tf.keras.layers.DenseFeatures(categorical_columns)\n",
        "print(categorical_layer(example_batch).numpy()[0])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "R7-1QG99_1sN"
      },
      "source": [
        "Ini akan menjadi bagian dari pemrosesan input data  ketika Anda membangun model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "kPWkC4_1l3IG"
      },
      "source": [
        "### Layer preprocessing gabungan"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "R3QAjo1qD4p9"
      },
      "source": [
        "Tambahkan dua koleksi kolom fitur dan teruskan ke `tf.keras.layers.DenseFeatures` untuk membuat lapisan input yang akan mengekstraksi dan memproses kedua jenis input:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "3-OYK7GnaH0r",
        "colab": {}
      },
      "source": [
        "preprocessing_layer = tf.keras.layers.DenseFeatures(categorical_columns+numeric_columns)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "m7_U_K0UMSVS",
        "colab": {}
      },
      "source": [
        "print(preprocessing_layer(example_batch).numpy()[0])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "DlF_omQqtnOP"
      },
      "source": [
        "## Membangun Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "lQoFh16LxtT_"
      },
      "source": [
        "Jalankan `tf.keras.Sequential`, mulai dari `preprocessing_layer`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "3mSGsHTFPvFo",
        "colab": {}
      },
      "source": [
        "model = tf.keras.Sequential([\n",
        "  preprocessing_layer,\n",
        "  tf.keras.layers.Dense(128, activation='relu'),\n",
        "  tf.keras.layers.Dense(128, activation='relu'),\n",
        "  tf.keras.layers.Dense(1, activation='sigmoid'),\n",
        "])\n",
        "\n",
        "model.compile(\n",
        "    loss='binary_crossentropy',\n",
        "    optimizer='adam',\n",
        "    metrics=['accuracy'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hPdtI2ie0lEZ"
      },
      "source": [
        "## Latih, evaluasi, dan prediksi"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8gvw1RE9zXkD"
      },
      "source": [
        "Sekarang model dapat dipakai dan dilatih."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "sW-4XlLeEQ2B",
        "colab": {}
      },
      "source": [
        "train_data = packed_train_data.shuffle(500)\n",
        "test_data = packed_test_data"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Q_nm28IzNDTO",
        "colab": {}
      },
      "source": [
        "model.fit(train_data, epochs=20)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "QyDMgBurzqQo"
      },
      "source": [
        "Setelah model dilatih, Anda dapat memeriksa akurasinya pada set `test_data`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "eB3R3ViVONOp",
        "colab": {}
      },
      "source": [
        "test_loss, test_accuracy = model.evaluate(test_data)\n",
        "\n",
        "print('\\n\\nTest Loss {}, Test Accuracy {}'.format(test_loss, test_accuracy))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "sTrn_pD90gdJ"
      },
      "source": [
        "Gunakan `tf.keras.Model.predict` untuk menyimpulkan pada label batch atau dataset batch.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Qwcx74F3ojqe",
        "colab": {}
      },
      "source": [
        "predictions = model.predict(test_data)\n",
        "\n",
        "# Tampilkan beberapa hasil\n",
        "for prediction, survived in zip(predictions[:10], list(test_data)[0][1][:10]):\n",
        "  print(\"Predicted survival: {:.2%}\".format(prediction[0]),\n",
        "        \" | Actual outcome: \",\n",
        "        (\"SURVIVED\" if bool(survived) else \"DIED\"))"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}